home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group93b.txt
/
000087_icon-group-sender _Fri May 14 12:10:26 1993.msg
< prev
next >
Wrap
Internet Message Format
|
1993-06-16
|
4KB
Received: by cheltenham.cs.arizona.edu; Fri, 14 May 1993 10:39:05 MST
Date: Fri, 14 May 93 12:10:26 CDT
From: "Richard L. Goerwitz" <goer@midway.uchicago.edu>
Message-Id: <9305141710.AA04578@midway.uchicago.edu>
To: icon-group@cs.arizona.edu
Subject: string stripping
Status: R
Errors-To: icon-group-errors@cs.arizona.edu
This is a long posting, but I hope it will be helpful. We have a lot
of quite newbies online here, and they should be let in on the fun. For
Sean it might be a bit simplistic, seeing as he's been using Icon for a
while. Bear with me Sean, and I'll offer you several distinct answers
to your question!
Sean states:
If i have a string and i want to strip out certain characters
(internal ones, so trim won't do it), ...
This is a good problem, and I'll bet that everyone who programs with
strings a lot has had to perform this chore. It's a perfect set-up for
a discussion of string scanning.
Here's your code:
# assume you want to remove hyphens and periods
full := "abc-def.ghi"
bare := ""
badchars := '-.'
every c := !full do
# test whether the intersection of c and badchars is empty
if *(c ** badchars) = 0 then bare ||:= c
This is perfectly good code, and nicely formatted. The only problem
with it I can see is that it doesn't take advantage of string scanning,
and its elegance/optimizations. Let's use our full, bare, and badchars
variables above, and try again using scanning:
full ? {
while bare ||:= tab(upto(badchars)) do
tab(many(badchars))
bare ||:= tab(0)
}
This actually subsumes a lot of stuff. Assume that our current position
is initially 1, and that position 0 means the end of the string (note the
distinct use of the zero offset in Icon, allowing us not to worry about
string lengths):
1) append everything in full, from our current position up to the
next occurrence of a badchar, to bare; if there are no badchars
left in full, do step 4
2) go past however many badchars we have found, and reset our cur-
rent position to just before the next "good" char
3) then do 1 again
4) append everything from the current position to the end of the
string to bare
Scanning, once you get the feel for it, is pretty clear, and usually very
elegant. And although people don't use Icon, Prolog, LISP, etc. for their
blinding speed, the scanning process is optimized, so you get relatively
good performance. Note that if you prefer a less thoroughly Iconish solu-
tion to your problem, you could write the following as a kind of compro-
mise:
every c := !full do
# test whether the intersection of c and badchars is empty
if any(badchars, c) then bare ||:= c
But here again, the temptation to use Icon's backtracking features for more
general expression of the problem is almost irresistable:
every bare ||:= !full -- badchars
I don't find this as clear as the scanning method, so I'd probably want to
prepend a comment:
# bare defined as full stripped of badchars
every bare ||:= !full -- badchars
If you want, you can use encapsulation via recursive function calls to get
rid of the assignment altogether:
procedure stripbads(full, badchars)
if *full = 0 then return ""
return (full[1] -- badchars) || strip(full[2:0], badchars)
end
This is starting to look LISPish, though, and I personally find it verbose
and somewhat opaque.
Note that I haven't tested any of this code, so if you find any bugs, please
don't be too shocked. My internal Icon interpreter still isn't fully iso-
morphic with the real one.
I'd love to teach an Icon course some day, but that's not the typical way in
which philologists normally find employment :-).
-Richard Goerwitz
goer@midway.uchicago.edu